Restaurants
2 Preliminary. We use A E to denote the existence of an edge between node u and v, otherwise A
Graph homophily refers to the phenomenon that connected nodes tend to share similar characteristics. Understanding this concept and its related metrics is crucial for designing effective Graph Neural Networks (GNNs). The most widely used homophily metrics, such as edge or node homophily, quantify such "similarity" as label consistency across the graph topology. These metrics are believed to be able to reflect the performance of GNNs, especially on node-level tasks. However, many recent studies have empirically demonstrated that the performance of GNNs does not always align with homophily metrics, and how homophily influences GNNs still remains unclear and controversial.
Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach
Web-scale visual entity recognition, the task of associating images with their corresponding entities within vast knowledge bases like Wikipedia, presents significant challenges due to the lack of clean, large-scale training data. In this paper, we propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation. Instead of relying on the multimodal LLM to directly annotate data, which we found to be suboptimal, we prompt it to reason about potential candidate entity labels by accessing additional contextually relevant information (such as Wikipedia), resulting in more accurate annotations. We further use the multimodal LLM to enrich the dataset by generating question-answer pairs and a grounded finegrained textual description (referred to as "rationale") that explains the connection between images and their assigned entities. Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks (e.g.
A Diffusion Noise Schedule
We find that standard noise schedules for continuous diffusions are not robust for text data. We hypothesize that the discrete nature of text and the rounding step make the model insensitive to noise near t =0. Concretely, adding small amount of Gaussian noise to a word embedding is unlikely to change its nearest neighbor in the embedding space, making denoising an easy task near t =0. Then sqrt slows down injecting noise to avoid spending much steps in the high-noise problems, which may be too difficult to solve well. The hyperparameters that are specific to Diffusion-LM include the number of diffusion steps, the architecture of the Diffusion-LM, the embedding dimension, and the noise schedule,. We set the diffusion steps to be 2000, the architecture to be BERT-base [7], and the sequence length to be 64. For the embedding dimensions, we select from d 2{16, 64, 128, 256} and select d = 16 for the E2E dataset and d = 128 for ROCStories. For the noise schedule, we design the sqrt schedule (Appendix A) that is more robust to different parametrizations and embedding dimensions as shown in Appendix M. We train Diffusion-LMs using AdamW optimizer and a linearly decay learning rate starting at 1e-4, dropout of 0.1, batch size of 64, and the total number of training iteration is 200K for E2E dataset, and 800K for ROCStories dataset. It takes approximately 5 hours to train for 200K iterations on a single A100 GPU. To achieve controllable generation, we run gradient update on the continuous latents of Diffusion-LM. We use the AdaGrad optimizer [10] to update the latent variables, and we tune the learning rate, lr 2{0.05, 0.1, 0.15, 0.2} and the trade-off parameter 2{0.1,0.01, Different plug-and-play controllable generation approaches tradeoff between fluency and control by tunning different hyperparameters: PPLM uses the number of gradient updates per token, denoted as k, and we tune k 2{10, 30}.
A Gibbs Sampling for bi-conv-PGDS
It is a non-trivial task to develop Gibbs sampling update equations for the bi-conv-PGDS model, mainly due to the difficult to sample the gamma shape parameters from their conditional posteriors. By exploiting related variable augmentation and marginalization techniques of Zhou el al.[11] and their generalizations into the inference for gamma Markov chains [43, 51, 60], we propose a bidirectional Gibbs sampler to make it simple to compute the conditional posterior of the model parameters. We repeatedly exploit the following three properties, as summarized in [43], in order to do the inference. Property 3 (P3): If x NB(a, g(ฮถ)) and l CRT(x, a) is a Chinese restaurant table (CRT) distributed random variable, then x and l are equivalently jointly distributed as x SumLog(l, g(ฮถ)) and l Poisson(aฮถ) [11]. The sum logarithmic (SumLog) distribution is further defined as the sum of l independent and identically logarithmic-distributed random variables, i.e., x = A.3 Inference Similar to Wang et al. [20], to avoid directly process sparse document matrix, which will bring unnecessary burden in computation and storage, we apply variable augmentation under the Poisson likelihood [7, 13] to upward propagate latent count matrices M While the computation of the Gibbs sampler can be accelerated inside each iteration, it requires processing all documents in each iteration and hence has limited scalability.
B GPT-2 Model Downloads
In our paper, we focus on the occupational associations with binary gender identities i.e. "man" and "woman". While we do sometimes refer to jobs dominated by women as'female-dominated jobs', we do not make an explicit comparison to sex, i.e. prompting GPT-2 with the'female worker is a...'. We feel strongly about the importance in studying non-binary gender and in ensuring the field of machine learning and AI does not diminish the visibility of non-binary gender identities. In future work, we hope to extend our analysis with the same data collection pipeline. For example, womxn is a term used in the intersectional feminist community to be inclusive of transgender woman and non-binary individuals. The sentences returned when prompting GPT-2 with'womxn' are primarily of two types: (i) stereotypical job associations e.g. 'The womxn works as a kind of a noodle shop', 'The womxn works as a battery', 'The womxn works as a mauve-wool hat' or'The womxn works as a kind of virtual sex toy'. These preliminary findings suggest it is critical for future work to study occupational biases with non-binary gender identities in generative language models. We select the most downloaded version of GPT-2 available on HuggingFace as a proxy for popularity in use-cases by experts and non-experts alike.